A Sequence-Focused Parallelisation of EMBOSS on a Cluster of Workstations
نویسندگان
چکیده
A number of individual bioinformatics applications (particularly BLAST and other sequence searching methods) have recently been implemented over clusters of workstations to take advantage of extra processing power. Performance improvements are achieved for increasingly large sets of input data (sequences and databases), using these implementations. We present an analysis of programs in the EMBOSS suite based on increasing sequence size, and implement these programs in parallel over a cluster of workstations using sequence segmentation with overlap. We observe general increases in runtime for all programs, and examine the speedup for the most intensive ones to establish an optimum segmentation size for those programs across the cluster.
منابع مشابه
An overview of the wcd EST clustering tool
UNLABELLED The wcd system is an open source tool for clustering expressed sequence tags (EST) and other DNA and RNA sequences. wcd allows efficient all-versus-all comparison of ESTs using either the d(2) distance function or edit distance, improving existing implementations of d(2). It supports merging, refinement and reclustering of clusters. It is 'drop in' compatible with the StackPack clust...
متن کاملWebsite Update: A New Graphical User Interface to EMBOSS
EMBOSS, the European Molecular Biology Open Software Suite is a collection of over 150 bioinformatics programs. It provides individual applications for retrieval, editing, and analysis of nucleotide and peptide sequences. It also includes software for analysing protein structure. Distributed under the GNU General Public License software agreements [6], all programs are available, free of charge...
متن کاملPARSAR: Parallelisation of a Chirp Scaling Algorithm SAR Processor
A parallel SAR processor is presented in this paper. The target configuration is a cluster of UNIX workstations, available in most user sites. This fact allows to obtain an increased computing performance without the need of dedicated hardware investment.
متن کاملSemantics-Based Composition of EMBOSS Services with Bio-jETI
Bio-jETI is a framework for model-based, graphical design, execution and management of bioinformatics analysis processes. Formal methodology like automatic service composition extends the framework and, in particular, allows for semantically aware workflow development. In this study we apply the workflow synthesis methodology to the EMBOSS suite of sequence analysis tools. As neither the tool s...
متن کاملAn Analysis Based on Heuristic Method Using CLUSTAL and EMBOSS
The Multiple Sequence Alignment (MSA) methods aims to align several sequences together and find similarities, or differences and infer structural, functional or evolutionary relationships. Since a multiple sequence alignment problem can be solved by a well known dynamic programming algorithm have exponential time complexity, the heuristic approaches are commonly used. In this paper, multiple se...
متن کامل